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Abstract 

The DEMOM media object data model aims at providing a uniform framework 
for managing different types of media data, i.e. images, text, sound or graphics. 
According to DEMOM media objects are defined as a class hierarchy of ob- 
jects, i.e. images, text, sound, and graphics being subtypes of the general type 
media object. Representation specific objects are regarded as subordinate 
types of the corresponding subtype, e.g. a SUN raster image in pixrect format 
is an instance of the subtype pixrect which is in turn a subtype of image. 

Using images as an example we discuss the media object hierarchy, the corre- 
sponding access operations and implementation issues. Content oriented 
search of media data on the basis of predicate calculus is considered as an es- 
sential part of DEMOM and hence discussed as well. 



1.0 INTRODUCTION 

Considerable efforts in data engineering have been put into developing DBMS for stan- 
dard commercial applications, such as accounting, banking and others that basically handle 
alphanumeric data. These conventional DBMS, however, do not provide the functionality that 
is required in forthcoming nonstandard applications like office automation or computer inte- 
grated manufacturing, for instance [Sh88]. 
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In addition to the alphanumeric data the nonstandard applications require the management 
of non-alphanumeric data like images, sound, graphics and text, generally referred to as me- 
dia data. The combination of alphanumeric and media data is not only necessary for these 
applications but is a must for enabling the management of media data themselves [Y088]. 
The raw media data like a pixel matrix of an image, for instance, are completely useless with- 
out some alphanumeric data, generally called registration data, that contain information about 
the encoding technique used, the colormap and the like. 

As a consequence, the underlying data model of a media DBMS has to provide means for 
combining alphanumeric and media data on two levels. Firstly, a direct combination of these 
two kinds of data is necessary in order to make up a usable instance of media data, as the al- 
phanumeric information, defining how to interpret the raw data, is needed for interpretation. 
Secondly, alphanumeric data is frequently required to be combined with media data from the 
application point of view. An example of such a combination is combining picture and sound 
recordings with alphanumeric data in an annotated slide projection. 

A reasonable basis for the integration of alphanumeric data and media data is provided by 
the concept of objects. Different data (alphanumeric as well as bitmap types, for instance) 
that form a logical unit are composed to form a more complex unit, referred to as composite or 
complex object. An important feature of objects is the encapsulation of data in the sense that 
an object hides the implementation of an encapsulated data structure by providing a set of op- 
erations on these data structures, representing the only way of accessing and manipulating 
the data structures. As mentioned above, image or sound data, for instance, that occur as bit- 
maps, require registration data in order to enable a proper interpretation of the bitmaps. 
Hence, it is quite natural to combine these registration data and bitmaps into complex ob- 
jects, generally referred to as media data objects or media objects, for short. For the 
combination of different media objects the terms multimedia object or mixed media object 
have been coined. Efforts are being made to develop multimedia database management sys- 
tems [WK 87 , MLW88] that allow the users to connect media objects to alphanumeric data. 
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e.g. a photo of a person and his/her name, birth date, address, etc. Thus, a media object is a 
complex object by nature as it is necessarily composed of subordinate data objects of simple 
or complex types. Objects that contain only raw data of a single medium are referred to as 
media objects. Such media objects, containing either raw data of the same medium or of dif- 
ferent media, can be combined to form a new, more complex object. This resulting object 
generally is called a multimedia object. 

Regarding the environment in which a multimedia object system may be needed, it is very 
likely that it will be a heterogeneous system. This development can be observed in almost 
every computerized application domain. That implies the coexistence of many different media 
object models. For the time being there is already a considerable variety of different data 
models for any type of media object available, i.e. there are lots of image, text and graphics 
data models in the market. One may question, then, why is another model needed. The an- 
swer to that is that these models are too application specific and are not generalizable. The 
operations on media objects are extremely application specific and should therefore form a 
part of the application itself and not of a media object management system. We need a media 
object model on a conceptual level that takes into account the uniformness of media objects 
from the structural point of view. This model must be flexible and extensible to provide a rea- 
sonable basis for integrating media objects into multimedia objects. 

Hence, in the rest of this paper we will discuss a generalized model for media objects that 
supports the combination of alphanumeric data and media data in the two ways as described 
above and that fulfills the requirement of extensibility and flexibility. We start with an overall 
description of our media object model DEMOM (DEscription based Media Object data Mod- 
el) which is supposed to be applicable to the different types of media data as listed above, 
followed then by a brief description of the media object data types that are specific for the dif- 
ferent media. A focal point of interest is content-oriented search for media data which we 
consider to be crucial . 
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2.0 THE DEMOM MEDIA OBJECT MODEL 

Two ways are generally in use for the management of complex objects: either one can ex- 
tend an existing data model towards complex objects as has been done with the relational 
model in POSTGRES [SR86], for instance, or one takes an object oriented approach like 
[WK87]. In the first case, the requirement of providing support for conventional and non- 
standard applications is basically satisfied. However, there are further operations required 
that are not provided by the relational model and must be developed for such a system. The 
second approach is based on an object oriented DBMS that is able to store objects of arbi- 
trary length. In currently existing systems (c.f. [MSOP86]) the set of operations defined for 
manipulating complex objects is very general in nature and does not take into account of spe- 
cific requirements of certain object types. Thus it is still up to the "user" of the object 
oriented DBMS (here the application programmer) to write the functions that are needed for 
managing the objects of a specific application under consideration. 

2.1 Basic Model 

As outlined in the introductory part, one needs a data model for complex objects that is 
not bound to a certain type of medium but is general enough to be uniformly applicable to dif- 
ferent types of media objects. As a consequence, we have concentrated on a model in which 
a media object consists of three parts: 

- registration data, 

- raw data, 

- content-description data. 

The registration data contain the object identification, a set of common registration data 
like ownership and access rights and a set of media specific information like a colormap, 
height, width, and pixel depth for an image, or sampling rate and encoding type for a sound, 
for instance. Although the internal structure of media data heavily depends on the media 
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type, registration data as such are necessary for all media to enable the correct interpretation 
of the raw data. The raw data section is a bitmap whose internal structure is disregarded. 

The content-description data section, or description for short, contains a natural language 
description of the object represented by the raw data. This description aims at supporting 
content search of the media data. 

In order to preserve the consistency of related registration data, raw data, and description 
these different data sections are encapsulated. That is a media object represents an abstract 
data type that is only accessible through media object type specific operations (figure 2-1). 



Export Interface 
Media Object Functions 
Registration Data 
Raw Data 

Contents Description 

Figure 2-1: Media Object Model 

2.2 Object Oriented Representation of Media Objects 

In terms of an object oriented approach DEMOM can be characterized as follows: 

A media object moj is an instance of a class MO that consists of a name, com- 
mon registration data, raw data, and a description. The class MO provides the 
aforementioned methods for manipulating the components of class members. 
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Media type specific classes can again contain subclasses that represent specific formats 
of the corresponding media type. For each media type there exists a wide variety of different 
formats. Almost every WYSIWYG (What You See Is What You Get) providing text pro- 
cessing tool has its own internal representation format. The same holds for graphics tools 
and for images that depend more on the underlying hardware. All such formats are consid- 
ered as subclasses of the media type class they belong to. The subclasses thus inherit the 
structure from their superclasses and, in addition, have their own part of registration data 
that are necessary for the interpretation of the raw data. 

Figure 2-2 illustrates a sample MO class hierarchy. The acronyms IMG, SND, TXT, and 
GRP stand for the media object subclasses of images, sound, text, and graphics, respective- 
ly. IMG, SND, TXT or GRP are subclasses of MO, i.e. their relationship to MO is 
characterized as an TS_A\ Instances of these subclasses inherit the structure of MO (type 
inheritance) and have in addition a component that contains media subtype specific registra- 
tion data. For the manipulation of these additional data a corresponding set of methods is 
provided. 

Furthermore, figure 2-2 shows the data structures encapsulated on each level of the hier- 
archy. MOID is a media object’s systemwide unique identification. MO_type specifies the 
subclass to which the media object belongs. MO_Name contains a symbolic name a user 
may assign to a media object. Ownership or access rights and the like are regarded as typi- 
cal registration data (CommonJRegData), common to all types of objects. RawData contains 
the real raw media data part A natural language based description of a media object’s con- 
tent is maintained in DescrData. 

The IMG subclass is further detailed into the classes PIX, ALV and URL. PIX stands for 
the SUN/Pixrect format [SUN86], ALV is a raster image format developed at Brown Univer- 
sity, and URL stands for Utah Run Length Encoded, an image format developed at the 
University of Utah [PBT86]. 
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PEX_Type 

PIX_Encoding 

PIX_Colormap 



ALV_Greyscale 



URL_CRTpos 

URL_Channels 

URL_Flags 

URL_Colormap 

URL_Comments 



: "IS_A" relationship 



Figure 2-2: Sample Media Object Class Hierarchy 

In addition to the common registration data a media object type, i.e. a subclass of MO, has 
its specific registration data. Typical examples for IMG specific registration data are the 
height, width and pixel depth of an image and for SND objects are the sampling rate. These 
registration data are included in the IMG_RegData or SND_RegData components, respec- 
tively. 

For the subclasses of IMG some subclass specific registration data are listed. They pro- 
vide colormap information, for instance, and other representation specific information. Thus, 
the registration data section, shown in figure 2-1, is made up of common registration data, 
media type specific registration data, and of format specific registration data. 
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The hierarchy described here is not restricted to the form depicted. Instead, new media 
types can be added. Potential candidates, for instance, are signals and videos [L088]. Due to 
the high level of abstraction of DEMOM such kinds of media objects can be integrated readi- 
ly. 

2.3 Methods on Media Objects and Subclasses 

We assume that a media object is identified by a systemwide unique object identifier 

MOID of type moid. All operations have this identifier as their first argument. 

class MO (subclass of OBJECT) 
void MO_remove (MOID) 

The media object denoted by MOID is removed from the set of media objects, 
void method MO_add_descr (MOID, *char) 

This method adds a description, pointed to by *char, to the media object, identified by 
MOID. 

void method MO_replace_description (MOID, *char) 

This method replaces the description of media object MOID through the description, 
pointed to by *char. 

*char method MO_get_description(MOID) 

For the media object denoted by MOID the related natural language description is re- 
turned. 



The following is a group of methods that operate on common registration data. As long as 
we have not precisely defined what the common registration data are we use non-terminal 
symbols for the definition of the method group. 

<commonregdata> method MO_get_<commonregdata> (MOED) 

This set of methods returns common registration data of a media object MOID. As 
potential registration data that are common to all media objects we envisage access 
rights, ownership, date of creation, date of last modification and the like. 

As a counterpart, we envisage the following group. 

void method MO_set_<commonregdata> (MOID, <commonregdata>) 

This set of methods enables the modification of common registration data. 
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Finally, we have a set of boolean methods, each of which is related to a subtype of MO. 
The method returns the value ’TRUE’ if the object identified by MOID is of the type that is 
checked. 

boolean method MO_IMG (MOID) 
boolean method MO_SND(MOID) 



In order to enable operations on media objects of different types we support methods on 
the class MO, or more precisely methods on sets that belong to the class MO (c.f. Gem- 
Stone and OPAL [MSOP86]). In particular, we think of content based retrieval. The example 
"Give me a list of all media objects that are related to Beethoven" illustrates this kind of ap- 
plication. The multimedia system can contain a picture of Beethoven, a book or letters of 
Beethoven, and a set of sound recordings of Beethoven’s music, for instance. 

class MO_SET (subclass of SET) 

select (mo_setidl, [x: MO_contents(x, Description)]) 

By means of the Description argument that contains the natural language description 
of the objects that the caller is looking for, a (maybe empty) set of media objects is 
identified and the list of identifiers is returned. 

select (mo_setid2, [x: MO_by_<commonregdata>(x, <commonregdata>)] 

This set of methods has as parameter a pointer to the input value for the correspond- 
ing common registration data type. Thus a set of media objects can be retrieved by 
means of their registration data. The (possibly empty) list of identifiers is returned by 
these methods. 

The subclasses of MO (i.e. IMG, SND, TXT and GRP) inherit these methods and add on 
their own, MO type specific ones. As an example we regard image specific methods on im- 
age specific registration data. 

Raster images are characterized through their height, width and pixel depth, i.e. the num- 
ber of bits used for the color or greyscale definition of a pixel. Consequently, these 
information constitute an image’s registration data. That in turn means that we have to pro- 
vide methods on these data if we intend to use them for retrieval, for instance. As these 



- 9 - 



information are tightly coupled with the raw data part of an image, we cannot modify the reg- 
istration data without modifying the raw data and vice versa. As a consequence, we provide 
read-only operations and leave the consistent update of raw data and registration data to the 
application, i.e. update-in-place is not supported, instead a new version of an image has to 
be created. 

class IMG (subclass of MO) 

int method IMG_get_height (MOED), 
int method EMG_get_width (MOED), 
int method IMG_get_depth (MOID) 

These methods return the height, width or pixel depth, respectively, of the image de- 
noted by MOID. 

In a similar way we can define methods on media object type specific registration data of 
SND, TXT or GRP objects. 

Specific formats of these media object types are again subclasses that inherit the methods 
defined for the corresponding type and add to their format specific methods. The specific for- 
mat materializes again as an extension of the registration data part that is needed for the 
proper interpretation of the specific raw data format. Typical information on this level are en- 
coding information, i.e. text formatting information or data compression information. Referring 
to the IMG subclasses, depicted in figure 2-2, we see that typical registration data on this 
level are colormap or greyscale information and encoding techniques used. For these data it 
is difficult to determine on beforehand whether updates can be allowed or not. The encoding 
technique used, for instance, is again tightly coupled with the raw data and hence updating it 
without modifying the raw image would hurt the object’s consistency. Colormap information, 
on the other hand, could be modified without modifying the corresponding raw image. In this 
case it depends on the application whether update possibilities are desired or not. As a con- 
sequence the methods defined on this level can only have the character of an example, not 
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more. Therefore, we restrict ourselves to the presentation of methods on images in pixrect 
format. 

class PIX (subclass of IMG) 

ENCODING method PIX_get_encoding (MOID) 

This method returns the encoding type used, i.e. RT_OLD, RT_STANDARD or 
RTJB YTE_EN CODED. 



int method PIX_get_cmap_len (MOID) 

This method returns the colormap length of image MOID. 

int method PIX_get_cmap_entrysize (MOID) 

This method returns the size of an entry in MOID’s colormap in bytes. A common 
size for RGB encoded images is 3 bytes, for instance. That provides for 256 different 
colors. 

cmap method PIX_get_cmap (MOID) 

This method returns the entire colormap of a pixrect image. 

void method PIX_put_cmap (MOID, cmap) 

As a counterpart to the preceding method, this one substitutes the current colormap 
of MOID by the one given as parameter cmap. 

This list is not complete but aims at giving an idea how the methods on the lower level 
look like. As in object oriented systems in general, this list can easily be extended or adapt- 
ed to the specific needs of a particular application. In a similar way the methods for other 
media types can be defined. 



3.0 CONTENT SEARCH 

Storing media data in a computer is not a problem; how to query the content of this data is. 
For example, if a witness wants to search the digitized criminal mug-shots to identify a sus- 
pect that has mean-looking face with a banana nose and beady eyes, the digitized images 
alone will not help much because extracting those features of an image is complex. A major 
difficulty in handling multimedia data is the richness of its semantic content. A number 100 
associated with the attribute of ’loan balance’ can mean very few things. An image of a per- 
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son, on the other hand, implicitly contains a great deal of information. For images in natural 
settings, the implications are still more complex, and video data still more. 

As already indicated by the description component of DEMOM we follow the approach of 
contents based search by means of verbal descriptions that form a part of a media object 
[LM89]. That does not mean, however, that we limit ourselves to this approach exclusively. 

3.1 Rationale for Limited Natural Language Descriptions 

A well-known approach to content description is the keyword approach as done in library 

information retrieval. But keyword search techniques have been demonstrably imprecise, ex- 
cept for simple applications, and users have often had great difficulty in focusing the search to 
documents of interest. The problem of keyword search is that keywords are discrete and no 
association between keywords are specified. 

Graphics objects, in general, have an internal structure (e.g. boxes, circles etc.) so that 
pattern matching algorithms become applicable. The same is true for relatively simple struc- 
tured image and sound objects. In such cases it might be appropriate to use pattern matching 
algorithms. 

The problem with image, sound, and graphics objects is that if the objects under consider- 
ation have very complex structures and are rich in semantics tools for contents analysis 
become very complex and extremely time consuming and are thus not suitable for retrieval in 
a multimedia database environment. 

The attachement of keywords also to non-textual data objects is a first step that enables 
the application of text retrieval mechanisms also to non-textual data. As pointed out above, 
keywords lack more complex linking mechanisms to adequately capture the contents of ob- 
jects like aerial photos, for example. Hence, the use of natural language descriptions seems 
to be a more viable solution. Full understanding of natural languages is not yet achievable, 
but caption understanding needs only a subset of its techniques. The studies in the areas of 
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natural-language understanding, expert and rule-based systems, and knowledge representa- 
tion and organization are utilized to help reduce the content search problem in multimedia 
data to the simpler caption-analysis problem. Captions are a natural but special, stylized 
way of writing descriptions with a subset of natural language. 

Hence, our approach aims at applications where the aforementioned strategies fail, i.e. 
where media objects contain a high degree of semantics that cannot or at least is not yet cov- 
ered by the current more "syntax-oriented" approaches. It is to the media DBMS 
architecture to provide for the coexistence of different retrieval strategies. As a major feature 
of object oriented systems is the extensibility of the set of methods that goes with an object, 
we can easily imagine to have "syntax-oriented" methods and semantics based methods to- 
gether in the same object. 

3.2 Scope Limitations of Natural Language Descriptions 

Natural language descriptions have the advantage that everyone is familiar with a natural 

language and therefore one can expect low resistance to the acceptance issue. But that does 
not automatically solve the problems of description understanding and matching of descrip- 
tions and queries. To be successful, focusing on specific application domains is necessary. 
Narrowing down on a particular application generally is accompanied by restricting the uni- 
verse of discourse. Further, making assumptions about the user or limiting the capabilities of 
the interface the user is provided, enables us to reduce the variety of syntactical structures 
that the natural language processing component must be able to handle. 

3.2.1 General Syntax Restrictions 

As descriptions are to provide us with facts about the raw data content, one does not 
need the full capability of a natural language. Hence, we made several restrictions with re- 
spect to the grammar of captions that our system accepts. 
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The first restriction is to limit the natural language to declarative statements. Remember, 
the reason for natural language processing in this context is the advantage of content de- 
scription compared to object recognition or keywords. That does not automatically imply the 
use of all types of grammatical structures. Descriptions in general are of declarative nature. 

Further, as style is not a matter of importance, we limit the declarative statements to be 
active voice and to be entirely composed of certain grammatical structures. Such way of re- 
striction is not believed to cause handicaps or the loss of power in describing the contents of 
the data but reduces the problem to solve. 

Another restriction is related to pronouns. As the system is supposed to be used by multi- 
ple users we can limit the use of pronouns and verbs to 3rd person singular and plural. 

In addition to these basic restrictions we also defined a grammar that limits the flexibility 
that is normally available in a natural language. Occurrences of prepositional phrases and 
participial phrases, for instance, are strictly regulated. A detailed discussion of our grammar 
is beyond the scope of this paper. For a complete description we refer to [Du90]. 

3.2.2 Limiting the Universe of Discourse 

As a database generally is restricted to a specific application, the vocabulary and the in- 
terpretations of them, as well as the interpretation of the descriptions and their semantics, 
are naturally constrained to a narrow domain of discourse. This means the system does not 
have to be able to understand everything that can be written with a natural language. The 
model and the knowledge needed for understanding are therefore much smaller and become 
manageable. To handle this problem our approach bases on an extended feature based dictio- 
nary (c.f. [GM89]), in which the users define the domain of each application, thus restricting 
their vocabulary and meanings, the semantics, the knowledge and the model for the system 
to apply. 
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3.3 Coping with Locations and Time 

Captions make extensive use of place and time descriptions. Thus, natural-language un- 
derstanding of captions requires detailed hierarchies of place names and time interval names. 
The hierarchy will necessarily be tangled, since there can be alternative generalizations of a 
term. Efficient access to place and time interval names is crucial since there are many ways 
to site the same image in rime and space, and it will be infrequent that the "standard descrip- 
tion" will be identical to what the querier of a system wants to call it. 

Hierarchical relations between location names are stored as parts of the location name en- 
tries in the dictionary. At description processing rime these relations are resolved by means 
of inference rules. Thus a query like "show me all aerial photos of Californian cities" results 
in a displaying photos of San Francisco, Los Angeles, San Diego etc. as their dictionary en- 
tries say that they are a ’PART_OF’ California. 

Time expressions are treated in the same way. 

3.4 Description Predicates and Their Representation 

Since natural-language captions require time and an arsenal of techniques to analyze, we 

parse captions during entry of their associated media objects into the database, and store the 
parse results in the database for access to the media objects. Since most parsing methods 
create a predicate calculus expression representing the meaning of some natural language 
(c.f. [Wi84], [GM89]), which almost always is the logical conjunction of a large number of 
terms, we represent the meanings of captions as lists of literals in Prolog notation. Inclusion 
of a parser in our system also means we can accept queries in natural language to the data- 
base. Then their descriptions can be matched to the descriptions of all media objects in the 
database to find all matches to the query. 

The imprecision and ambiguity of the natural language descriptions is reduced consider- 
ably by transforming them into a set of predicates. These predicates state facts about the 
real-world objects involved in the media object. Real-world objects and activities are re- 
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ferred to through their name, or identifier. The predicate states their properties - as indicated 
by the media object - and their relationships. In many cases, the name of the object may not 
be known, so that artificial identifiers have to be created. For instance, an image showing a 
car can be described by the predicates "car (x), manufacturer (x, Horch), year_built (x, 
1922)", etc. The use of the name connects different predicates that state properties of the 
same object. 

The parser uses a dictionary of all the words it can recognize, and this dictionary also 
shows the predicates to use when a word appears in the description. Hence, the set of all 
predicates that can be used in the descriptions, is defined in the dictionary. 

3.5 Matching 

The result of parsing is one set of predicates per media object, interconnected by object 
identifiers. A query is also entered in natural language and then parsed. In contrast to the de- 
scription, the arguments of the query predicates can be variables. A media object is selected 
for the result of the query, if and only if there exists a binding of those variables to object 
identifiers such that the description predicates of the media object logically imply all the que- 
ry predicates. 

The match of user query to database media object need not be exact. A set of rules, some 
of which must be domain dependent, specifies situations in which sets of literals that look dif- 
ferent are really the same. Reasoning about type hierarchies is a simple but important 
example; for instance, a query that specifies a road should match a caption that specifies a 
freeway. However, the inverse may not be valid, i.e. freeway is a special kind of road and the 
distinction may be necessary. Another example is inference of containment of one time inter- 
val in another; for instance, a query that mentions a date between May 15th and June 30th 
should match a caption that specifies June 1st. Another important example is reasoning 
about physical relationships; for instance, a query that mentions a forest west of a road 
should match a caption that specifies a road east of a forest. 
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The matching catches different natural language phrases with the same meaning, but it 
does not catch semantic relationships among predicates. If the description for an image is "a 
car with a red body”, the predicates generated will be something like "car (x), component (x, 
y), body (y), color (y, red)”. A query that asks for "a red car" is translated into something 
like "car (x), color (x, red)", and there would be no match. The system does not know that 
the color of a car’s body is just the same as the color of the car. 

To overcome this problem, rules can be introduced that express the semantic relationships 
among the predicates. In our example, the rule could be: 

if car (A), component (A, B), body (B), color (B, C) then color (A, C). 

Using this rule, color (x, red) can be inferred in the example, and thus the query would 

match the description. This is similar to the use of S-rules in the START system [Ka88]. 

A key unsolved problem, only really serious with multimedia data, is which literals to try 
to generalize to get a match, and how far to generalize. Domain-dependent knowledge can 
help, but we believe there are undiscovered general principals. 

4.0 CONCLUSION 

The handling of multimedia data imposes new requirements on database management 
systems, especially when regarding the integration support of conventional and multimedia 
data. In this paper we present an approach that aims at easily integrating conventional alpha- 
numeric and multimedia data by providing the object-oriented DEMOM media object model. 

A media object is designed as encapsulating raw data, registration data and, in contrast to 
other projects in the area of multimedia DBMS research, contents-description data. These 
data can only be accessed through object specific operations that also form a part of the en- 
tire media object. 

The raw data contain the byte stream whose potential internal structure is disregarded. 
Thus, for the reasonable interpretation of the raw data some registration information is nec- 
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essary. The additional contents-description data are introduced for supporting contents 
search. Internally, these descriptions are translated into description predicates by means of a 
predicate calculus. 

Due to the general nature of our object model it is applicable to the different kinds of media 
without modification. Hence we designed our multimedia DBMS as being composed of a set 
of media DBMS which have a uniform gross architecture (c.f. [L088]). The multimedia 
DBMS architecture sketched there looks very similar to that of the MUSE multidatabase 
system [H088] that is implemented as a decentralized system. The correspondence in the 
architecture and the flexibility of the MUSE system makes it an ideal candidate for the inte- 
gration of the media DBMS. As described in [Ho 90 ] MUSE’s transaction concept also 
provides the flexibility and extensibility that is required for multimedia object management. 
Consequently, on this basis we can also envisage the multimedia DBMS as being distribut- 
ed. 

At the time being we have a prototype implementation for image objects in an image 
DBMS [Th88]. The prototype supports images in SUN pixrect format and in ALV format. 
The system is running on SUN under OS 4 . 0 . 1 . It is written in C and uses the Ingres relation- 
al database system for managing the registration data of image objects. 

A major research issue that will be tackled in the future is the management of description 
predicates. Logically the predicates are stored and indexed in the database to facilitate query 
processing. The data structure that is best for this purpose is not known. Normal indexing 
method used in conventional database systems where each predicate is followed by a list of 
data instances containing that predicate are found to be inadequate, as some of the predicate 
terms may not be very selecuve— i.e. some predicates may be associated with a very large 
number of media data instances. Trade-offs involving detailed data structures and the pro- 
cessing strategies must be analyzed carefully to draw any concrete conclusions. 
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